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Abstract 

Kaltofen has proposed a new approach in (Kaltofen, 1992) for computing matrix determinants 
without divisions. The algorithm is based on a baby steps/giant steps construction of Krylov 
subspaces, and computes the determinant as the constant term of a characteristic polynomial. 
For matrices over an abstract ring, by the results of Baur and Strassen (1983), the determinant 
algorithm, actually a straight-line program, leads to an algorithm with the same complexity 
for computing the adjoint of a matrix. However, the latter adjoint algorithm is obtained by 
the reverse mode of automatic differentiation, hence somehow is not "explicit". We present an 
alternative (still closely related) algorithm for the adjoint that can be implemented directly, we 
mean without resorting to an automatic transformation. The algorithm is deduced by applying 
program differentiation techniques "by hand" to Kaltofen's method, and is completely decribed. 
As subproblem, we study the differentiation of programs that compute minimum polynomials of 
lineraly generated sequences, and we use a lazy polynomial evaluation mechanism for reducing 
the cost of Strassen's avoidance of divisions in our case. 

Key words: matrix determinant, matrix adjoint, matrix inverse, characteristic polynomial, 
exact algorithm, division-free complexity, Wiedemann algorithm, automatic differentiation. 



Introduction 



Kaltofen has proposed in ([Kaltofenl . 1992) a new approach for computing matrix de- 
terminants. This approach has brought breakthrough ideas for improving the complexity 
estimate f or the problem of computing the determinant without divisions over an abstract 
ring (see (jKaltofenl. Il992t IKaltofen and VillardL 12004 ). With these foundations, the al- 
gorithm of iKaltofen and Villardl (|2004h computes the determinant in 0(n 2 7 ) additions, 
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subtractions, and multipli cations. The same ideas also lead to the currently best known 
bit complexity estimate of lKaltofen and Villardl (|2004l ) for the proble m of computing the 
characteristic polynomial. 

We consider the straigth-line programs of lKaltofen! (|l992T ) for computing the determi- 
nant over abstract fields or ri ngs (with or without divi sions). Using the reverse mod e of 
automatic differentiation (see Linnainmaa ( 1970l 19761 ). and ( Ostrowski et all Il97ll )). a 
straight-line program for computing the determinant of a matrix A can be (automatically) 
transform ed into a program f or com puting the adjoint matrix A* of A. This principle, 
stated by iBaur and Strasse 3 (|1983L Cor. 5), is also applied by iKaltofenl (|l99l Sec. 1.2) 
for computing A*. Since the adjoint program is derived by an automatic process, few is 
known about the way it computes the adjoint. The only available information seems to 
be the determinant program itself, and the knowledge we have on the differentiation pro- 
cess. Neither the adjoint program can be described, or implemented, without resorting 
to an automatic differentiation tool. 

In this paper, by studying the differentiation of Kaltofcn's determinant algorithm step 
by step, we produce an "explicit" adjoint algorithm. The determinant algorithm, that 
we first recall in Section 2 over an abstract field K, uses a Krylov subspace construction, 
hence mainly reduces to vector times matrix, and matrix times matrix products. An- 
other operation involved is computing the minimum polynomial of a linearly generated 
sequence. We apply the program differentiation mechanism, reviewed in Section 3, to the 
different steps of the determinant program in Section 4. This leads us to the description 
of a corresponding new adjoint program over a field, in Section 5. T he algorithm we 
obtain somehow calls to mind the matrix factorization of lEberlv (|1997L (3.4)). We note 
that our objectives are similar to Eberly's ones, whose questi on was to give an explici t 
inversion algorithm from the parallel determinant algorithm of lKaltofen and Pan! ( 1991 ). 

Our motivation for studying the differenti ation and resulti ng adj oint algorithm, is the 
i mpor tance of the determinant approach of Kaltofen (1992), and iKaltofen and Villard 
(|2004l ). for various complexity esti mates. Recent advan ce s around the determ i nant o f 
polynomial or integer m atrices (see Eberlv et all (|200fj|); [ Kaltofen and Villardl (|2004 ): 
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20031 120051) ). and matrix inversion (see IJeannerod and Villardl (|2006l ). and 
Storiohannl (|2008f) ) also justify the study of the general adjoint problem. 



For computing the de terminant witho ut divisions over a ring R, Kaltofen applies the 
avoidance of divisions of Strassenl ( 1973f ) to his determinant algorithm over a field. We 
apply the same strategy for the adjoint. From the algorithm of Section 5 over a field, 
we deduce an adjoint algorithm over an arbitrary ring R in Section 6. The avoidance of 
divisions involves computations with truncated power series. A crucial point in Kaltofen's 
approach is a "baby steps/giant steps" scheme for reducing the corresponding power 
series arithmetic cost. However, since we use the reverse mode of differentiation, the flow 
of computation is modified, and the benefit of the baby steps/giant steps is partly lost for 
the adjoint. This asks us to introduce an early, and lazy polynomial evaluation strategy 
for not increasing the complexity estimate. 

The division-free determinant algorithm of lKaltofen (|1992|) uses 0~(n 3 5 ) operations in 
R. The adjoint algorithm we propose has essentially the same cost. Our study may be seen 
as a fi rst step for the differentiation of the more efficient algorithm of lKaltofen and Villard 
(|2004l ). The latter would require, in particular, to consider asymptotically fast matrix 
multiplication algorithms that are not discussed in what follows. 
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Especially in our matrix context, we note that interpreting programs obtained by 
automatic differentiation, may have connections with the interpretation of programs de- 
rived using the transposition principle. We refer for instance to the discussion of lKaltofen 
II2000L Sec. 6). 



Cost functions. We let M(n) be such that two univariate polynomials of degree n 
over an arbitrary ring R can be multiplied using M(n) operations in R. The algorithm 
of lCantor and Kaltofen] (|l99lh allows M(n) = O(nlognloglogn). The function 0(M(n)) 
also measures the cost of truncated power series arithmetic over R. For bounding the cost 
of polynomial gcd-type computations over a commutative field K we define the function 
G. Le t G(n) be such that the extended gcd problem (see ( von zur Gathen and Gerhard! . 
1999L Chap. 11)) can be solved with G(n) operations in K for polynomials of degree 2n in 
K\x]. The recursive Knuth/Schonhage half-Gcd algorithm (see (Knuth, 19701 : Schonhage . 
19711 iMoenckl Il973h ) allows G(n) = 0(M( n) log n) . The minimum polynomial of de- 
gree n, of a linearly generated sequence given by its first 2n terms, can be computed in 
G(n) + 0(n) operations (see (jvon zur Gathen and Gerhard fl99flL Algorithm 12.9)). Wc 
will often use the notation 0~ that indicates missing factors of the form a(logn)", for 
two positive real numbers a and (3. 



2. Kaltofen's determinant algorithm over a field 



Kaltofen's determinant algorithm extends the Krylov-based method of IWiedemann 
(Il986h . The l atter approach is successf ul in various situations. We r e fer es pecially to the 
algorithms of Kaltofen and Pan ( 1991 ) and Kaltofen and Saunders! ( 1991 ) around exact 
linear system solution that has served as basis for subsequent works. We may also point 
out the various questions investigated bv lChen et al. ( 2002 ). and references therein. 



v 



introduce the Hankel matrix H . ,. , 

< k < 2n — 1. We also assume that H is non-singular 



(uA 



i+j-2 



G K" xn , and let hi 



uA v for 



dct H = det 



uAv uA v 



uA r - 



uA n v 



1A 2 



^0. 



(1) 



In the applications, (1) is e nsured either by construction of A, u, and v, as in (jKaltofen . 
1992t iKaltofen and Villardl . I2004T ) , or by randomizat ion (see the above cit ed references 
around Wiedemann's approach, and riKaltofenLflflfjj Kaltofen and villardl . lioolr n. 

One of the key ideas of Kaltofenl ( 19921 ) for reducing the division- free complexity esti- 
mate for computing the determinant, is to introduce a "baby steps/giant steps" behaviour 
in the Krylov subspace construction. With baby steps/giant steps parameters r = \2n/s~\ 
and s = \s/n\ (rs > 2n) we consider the following algorithm. 
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Algorithm Det 

Input: A £ K" x ",m£ K lxn ,v £ K nxl 

step i. «o := w; For i = 1, . . . , r — 1 do Vi := Avi-i 
STEP II. B := A r 

step III. u :— u; For j = 1, . . . , s — 1 do Uj := Uj-\B 
step iv. For i = 0, 1, . . . , r — 1 do 

For j = 0, 1, . . . , s — 1 do hi + j r := UjVi 
step v. / := the minimum polynomial of {/ifc}o<fc<2n-i 

Output: det A := (-l) n /(0). 
We ommit the proof of ne xt theorem that establishes the correctness and the cost of 



Algorithm Det, and refer to iKaltofenl (|1992l ). We may simply note that the sequence 



{hk}o<k<2n-i is linearly generated. In addition, if (1) is true, then the minimum polyno- 
mial / of {hk}o<k<2n~i, the minimum polynomial of A, and the characteristic (monic) 
polynomial of A coincide. Hence (— l) n /(0) is equal to the determinant of A. Via an 
algorithm that can multiply two matrices of K™ xn in 0(n u ) we have: 

Theorem 1. If A £ K nx ™. u £ K lxn , and v £ K" xl satisfy (1), then Algorithm Det 
computes the determinant of A in 0(n"logn) operations in K. 

For the matrix product we may set lj = 3, or u — 2.376 using the algorithm of lCoppersmith and Winogradl 
(Il990h . ! n the rest of the paper we work with a cubic matrix multiplication algorithm. 



Our study has to be generalized if fast matrix multiplication is introduced. 



Backward automatic differentiation 



The determinant of A € K™ x ™ is a polynomial A in Kfai^, . . . , a^.j, . . . , a„.„] of the 
entries of A. We denote the adjoint matrix by A* such that AA* — A* A = (det A)I. As 
noticed bv lBaur and Strassenl |l983), the entries of A* satisfy 



<9A 

da,j 



, 1 < i, j < n. 



(2) 



The reverse mode of automatic differentiation allows to transform a program which 
computes A into a program which computes all the partial derivatives in (2). Among the 
rich literature a bout the reve r se mode of auto matic differentiat i on we may refer to the 
seminal works of Linnainmaa ( 197Ct 19761 ) and Ostrowski et al. ( 197ll). For deriving the 
adjoin t pro gram from th e deter minant program we follow the lines of lBaur and Strassen 
(|l983l ) and lMorgensternl (| 19851) . 

Algorithm Det is a straight-line pr ogram over K. For a comprehensive study of 



straight-line programs for instance see (jBurgisser et all 119971 Chapter 4) . We assume 
that the entries of A are stored initially in n 2 variables 5i, — n 2 < i < 0. Then we as- 
sume that the algorithm is a sequence of arithmetic operations in K, or assignments to 
constants of K. Let L be the number of such operations. We assume that the result of 
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each instruction is stored in a new variable Si, hence the algorithm is seen as a sequence 
of instructions 

6i := 5j op 6k, op G {+,-, X,-=-}, -n 2 <j,k<i, (3) 

or 

<5, := c, c G K, (4) 
for 1 < i < L. Note that a binary arithmetic operation (3) where one of the operands is a 
constant of K can be implemented with the aid of (4). For any < i < L, the determinant 
maybe be seen as a rational function Aj of <5_„2 +1 , . . . , 5i, such that 

A (£_ n 2 +1 , ...,S ) = A(a M , . . . , a n<n ), (5) 

and such that the last instruction gives the result: 

detA = 5 L = A L (5_„2 +1 ,...,5 L ). (6) 

The reverse mode of automatic differentiation computes the derivatives (2) in a back- 
ward recursive way, from the derivatives of (6) to those of (5). Using (6) we start the 
recursion with 

^ = 1,^ = 0, -.■<,<!-!. 
dS L ddi 

Then, writing 

Ai-i(5_„2 +1 , . . . , <S*_i) = Ai(6_ n 2 +1 , ...,6i) = Ai(<5_„2 +1 , . . .,g(5j,5k)), (7) 
where g is given by (3) or (4), we have 

-^ = ^r + j^rii> -n 2 <i<i-i, s 

ddi ddi ddi ddi 

for 1 < i < L. Depending on g several cases may be examined. For instance, for an 
addition Si :— g{Sk, Sj) = 5k + Sj, (8) becomes 

<9Ai_i dAi dAi 9Ai_i dA, OA, 



85k dSk dSi ' dSj dSj OS 



(9) 



with the other derivatives (/ 7^ k or j) remaining unchanged. In the case of a multipli- 
cation Si :— g(Sk,Sj) — 6^ x Sj, (8) gives that the only derivatives that are modified 
are 

0Aj-i _ OAj , dAj x dAi-i_dAi dAi (m , 

ds k ~ ds k + ds % °" ds 3 ~ os, + d6i k - [ ' 

We see for instance in (10), where Sj is used for updating the derivative with respect 
to 6k, that the recursion uses intermediary results of the determinant algorithm. For the 
adjoint algorithm, we will assume that the determinant algorithm has been executed 
once, and that the Si's are stored in n 2 + L memory locations. 

Recursion (8) gives a practical mean, and a program, for computing the N = n 2 
derivatives of A with respect to the a^j's. For any rational function Q in N variables 
6—N+1, ■ ■ ■ ,60 the corresponding general statement is: 

Theorem 2. \Baur and Strass Let V be a straight-line program computing Q 

in L operations in K. One can derive an algorithm dV that computes Q and the N partial 
derivatives dQ/dSi in less than 5L operations in K. 
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Combining Theorem 2 with Theorem 1 gives the construction of an algorithm <9Det 
for computing the adjoint matrix A* (see (|Baur and StrassenL fl983l Corollary 5)). The 
algorithm can be generated automatically via an automatic differentiation too|_j. How- 
ever, it seems unclear how it could be programmed directly, and, to our knowledge, it 
has no interpretation of its own. 



4. Differentiating the determinant algorithm over a field 



We apply the backward recursion (8) to Algorithm Det of Section 2 for deriving 
the algorithm i9Det. We assume that A is non-singular, hence A* is non-trivial. By 
construction, the flow of computation for the adjoint is reversed compared to the flow of 
Algorithm Det, therefore we start with the differentiation of step v. 

4-1. Differentiation of the minimum polynomial constant term computation 

At step v, Algorithm Det computes the minimum polynomial / of the linearly 
generated sequence {/ifc}o<fe<2n-i- Let A be the first instruction index at which all the 
/ifc's are known. We apply the recursion until step A, globally, we mean that we compute 
the derivatives of A\. After the instruction A, the determinant is viewed as a function 
A v of the /ife's only. Following (7) we have 

det (A) = A A (5_„2 +1 , ...,5x) = A v (/n, . . .,h 2n -i)- 

Hence we may focus on the derivatives dAy/dhk, < k < In — 1, the remaining ones 
are zero. 



Using assumption (1 


we 


degree n, and if f(x) 


fo + 




' fo ~ 






h 




If 








fn-l 





ho hi 
hi h 2 

h n -i . . . 



+ fn-lX n ~ 

hn-i 

lln 



then / satisfies 





" fo ' 




h n 




fl 




h n +i 




. f n - x _ 




/l2n-l 



(11) 



see, e.g., (iKaltofenl. ll992Tl or (I von zur Gathen and Gerhard! . I1999L Algorithm 12.9) to- 
gether with (jBrent et all ll980T ). Applying Cramer's rule we see that 



/o = (-l) n det 



hi h 2 ■ ■ ■ h n 

hi h-i ... h n +i 



I det H, 



hence, defining H A = (uA l+: > 1 v) 1<i j<n = {hi+j-i)i<i,j<n e K nx ™, we obtain 

detH A 



A x 



detH 



(12) 



1 We refer for instance to http://www.autodiff.org 
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Let IC U and K, v be the Krylov matrices 

t u = [u T , A T u T , . . . , (A T ) n - 1 u T ] T e K" x ", (13) 

and 

!C v = [v 1 Av,...,A n - 1 v}eK nxn . (14) 
Since H — )C U IC V , assumption (1) implies that both K, u and IC V are non-singular. Hence, 
using that A is non-singular, we note that Ha = K. U AJC V also is non-singular. 

For differentiating (12), let us first specialize (2) to Hankel matrices. We denote by 
(dA/dai,j)(H) the substitution of the a,j's for the entries of H in dA/ddij, for 1 < 
i,j < n. From (2) we have 

dA 

h*,i = ^—(H),l<i,j<n. 

Since the entries of H are constant along the anti-diagonals, we deduce that 
ddctH ^ dA 



-W = E h h = E Kv < fc < 2n - 2. 



dh k ^— ' ddi , 

i+j-2=k '■> i+j-2=k i+j-2=k 



In other words, we may write 

ddetH 



= a k {H*), < fe< 2n- 1, (15) 



dh k 

where, for a matrix M — (rriij), we define 

cr fc (M) = + l<i,j <n. 

i+j-2=k 

The function Gk{M) is the sum of the entries in the anti-diagonal of M starting with 
mi^+i if < k < n — 1, and mk- n +2,n if n < k < In — 2. Shifting the entries of H for 
obtaining Ha we also have 

dA f HA = a k ^(H* A ), < k < 2n - 1. (16) 

Now, differentiating (12), together with (15) and (16), leads to 

<9A V _ (ddetH A /dh k ) (ddet H/dh k ) det H A _ (ddet H A /dh k ) det H A ( H -i\a 

~dh k ~ ~ detH detH det if ~ detH A detH ' v 
and, consequently, to 

- (o-k-i^ 1 ) - <Tk{H~ 1 )) A v , < k < 2n- 1. (17) 

With (17) we identify the problem solved by the first step of the 9Det algorithm, 
and provide first informations for interpreting or implementing the adjoint program. 
Various algorithms may be used for c omputing the minimum polynomial (for instance see 
( von zur Gathen and Gerhard . 19991 Algorithm 12.9)), that will lead to corresponding 
algorithms for computing the left sides in (17). However, we will not discuss these aspects, 
since the associated costs are not dominant in the overall complexity. 

We have recalled, in the introduction, that the minimum polynomial / (its constant 
term /(0)) can be computed from the h k 's in G(n) + 0(n) operations in K. Hence The- 
orem 2 gives an algorithm for computing the derivatives using 5G(n) + 0(n) operations. 
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Alternatively, in the Appendix we propose a direct approach that takes advantage of (17). 
Proposition 4 shows that if /, H, and Ha are given, then the dA v /dhkS can be computed 
in G(n) + 0(M(n)) operations in K. 

4-2. Differentiation of the dot products 

For differentiating step iv, A is seen as a function A IV of the ufs and vfs. The entries 
of Uj are used for computing the r scalars hj r ,h\ + j r , . . . , h^._ 1 - )+ j r for < j < s — 1. 
The entries of v t are involved in the computation of the s scalars hi, h i+r , . . . , h i+ ( s _ 1 y r 
for < i < r — 1. 

In (8), the new derivative d£\i-\/d8i is obtained by adding the current instruction 
contribution to the previously computed derivative dAi/dSi. Since all the hi+jrS are 
computed independently according to 

n 

h i+ jr = ^2(Uj)l(Vi)l, 
1=0 

it follows that the derivative of A IV with respect to an entry (uj)i or (vi)i is obtained by 
summing up the contributions of the multiplications (uj)i(vi)i. We obtain 



and 



dA IV 

d{uj)i 



gAiy 

d{vi)i 



£ 

i=0 



dA v 
dh 



-{Vi) u 0<j<s-l, l<^<n, 



s_ 9A 

53 Qj^( u jh < * < r - 1, l<Z<n. 



(18) 



(19) 



By abuse of notations (of the sign 9), we let duj be the n x 1 vector, respectively 
<9«i be the 1 x n vector, whose entries are the derivatives of A IV with respect to the 
entries of Uj, respectively Vi. Note that because of the index transposition in (2), it is 
convenient, here and in the following, to take the transpose form (column versus row) 
for the derivative vectors. Defining also 

/ cA v 



8H 



\dhi+j r / o<t<r-l, 0<j<s-l 

we deduce, from (18) and (19), that 

[du , dui, . . . , du s -i] = [v ,vi, . . . , v r -i] dH e K r ' 

and 



€ K r 



dv 




Uq 


dv\ 




Ul 




= 8H 




dv r -i 




Us-l 



(20) 



(21) 



Identities (20) and (21) give the second step of the adjoint algorithm. In Algorithm Det, 
step iv costs essentially 2rsn additions and multiplications in K. Here we have essen- 
tially 4rsn additions and multiplications using basic loops (as in step iv) for calculating 
the matrix products, we mean without an asymptotically fast matrix multiplication al- 
gorithm. 



8 



4-3. Differentiation of the matrix times vector and matrix products 



The recursive process for differentiating step hi to step i may be written in terms 
of the differentiation of the basic operation (or its transposed operation) 

q:=p-MeK lxn , (22) 

where p and q are row vectors of dimension n, and M is an n x n matrix. We assume at 
this point (by construction of the recursion) that column vectors dp and dq of derivatives 
of the determinant with respect to the entries of p and q, are available. For instance, for 
differentiating step hi, we will consider the du/s. We also assume that annxn matrix 
dM, whose transpose gives the derivatives with respect to the m^'s, has been computed. 
Initially, for step III, we will take dB = 0. 

Following the lines of previous section for obtaining (20) and (21), we see that differ- 
entiating (22) amounts to updating dp and dM according to 

dp := dp + M ■ dq e K", 

(23) 

dM :=dM + dq-p£ K nxn . 

Starting from the values of the du/s computed with (20), and from dB = 0, for the 
differentiation of step hi, (23) gives 

duj-i := du-i-i + B ■ duj, 

3 (24) 

dB := dB + duj ■ Uj-i, j = s — 1, . . . , 1. 
For step II, we mean B :— A r , we show that the backward recursion leads to 

r 

dA:=J2 Ar ~ k ■dB-A k ~ 1 . (25) 

fe=i 

Here, the notation dA stands for the nxn matrix whose transpose gives the derivatives 
dA u /daij. We may show (25) by induction on r. For r = 1, dA = dB is true. If (25) is 
true for r — 1, then let C = A r ~ x and B = CA. Using (23), and overloading the notation 
dA, we have 

' dC = A ■ dB e K nx ™, 
dA^dB-C e K nxn . 
Hence, using (25) for r — 1, we establish that 

dA = dA + J2l=\ A r - k - x ■ dC ■ A k -\ 

= dB-C + Yl r k Z\ A r - k - x ■ (A ■ dB) ■ A k ~ x 

= dB ■ A r - X + Y,l=\ A r - k ■ dB ■ A k - X = £Li A ^ k ' dB ' ■ 

Any specific approach for computing A r will lead to an associated program for com- 
puting dA. Let us look, in particular, at the case where step ii of Algorithm Det is 
implemented by repeated squaring, in essentially log 2 r matrix products. Consider the 



9 



recursion 

A Q :=A 

For k = 1, . . . , log 2 r do A 2 k := A 2 k-i ■ A 2 k-i 
B := A r 

that computes B := A r . The associated program for computing the derivatives is 
dA r := dB 

For k = log 2 r, . . . , 1 do dA 2 k-i :— A 2 k-i ■ dA 2 k + dA 2 k ■ A 2 k-i (26) 
dA := dA Q , 

and costs essentially 2 log 2 r matrix products. 

From the values of the dvi's computed with (21), we finally differentiate step I, and 
update dA according to 

dvi-i := dvi-i + dvi- A, 

(27) 

OA := dA + Vi-i ■ dvi, i = r — 1, . . . , 1. 

Now, dA is the n x n matrix whose transpose gives the derivatives dA I /da,ij = 
dA/ddij, hence from (2) we know that A* = dA. 

STEP in and step I both cost essentially r (« s) matrix times vector products. From 
(24) and (27) the differentiated steps both require r matrix times vector products, and 
2rn 2 + 0(rn) additional operations in K. 



5. The adjoint algorithm over a field 

We call Adjoint the algorithm obtained from the successive differentiations of Sec- 
tion 4. Algorithm Adjoint is detailed below. We keep the notations of previous sections. 
We use in addition U € K sxn and V G K" xr (resp. dU € K nxs and dV € K rxn ) for the 
right sides (resp. the left sides) of (20) and (21). 

The cost of Adjoint is dominated by step iv* , which is the differentiation of the ma- 
trix power computation. As we have seen with (26), the number of operation is essentially 
twice as much as for Algorithm Det. The code we give allows an easy implementation. 

We note that if the product by det A is avoided in step i* , then the algorithm computes 
the matrix inverse A -1 . We may put this into perspective with the algorithm given 



bylEberlyj (|l997h . With K u and K v the Krylov matrices of (13) and (14), Eberly has 



proposed a processor-efficient inversion algorithm based on 

A' 1 = K v H A Y t u . (28) 

To see whether a baby steps/giant steps version of (28) would lead to an algorithm similar 
to Adjoint deserves further investigations. 
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Algorithm Adjoint (<9Det) 

Input: A £ K™ xn non-singular, and the intermediary data of Algorithm Det 
All the derivatives are initialized to zero 

step I*. /* Requires the Hankel matrices H and Ha, see (17) */ 

dAy/dhk := (a^iH^) - a k {B- x )) det A, < k < In - 1 

STEP II*. /* Requires the Uj 's and v l 's, see (20) and (21) */ 
dU := V ■ dB 
dV :=dH-U 

step in*. /* Requires B = A r , see (24) */ 
For j = 8 — l,...,l do 

duj-x := duj-i + B ■ duj 
dB := dB + duj ■ Uj-i 

step iv*. /* Requires the powers of A, see (25) or (26) */ 
A* := Y?k=i A r ~ k ■ dB ■ A*- 1 

step v*. /* See (27) */ 

For i = r — 1 , . . . , 1 do 

dvi-i := dvi-i + dvi ■ A 
A* := A* • dvi 

Output: The adjoint matrix A* £ K nX ". 



6. Application to computing the adjoint without divisions 

Now let A be an n x n matrix over an abstract ring R. Kaltofen's algorithm for 
computing the determinant of A without divisions applies Algorithm Det on a well 
chosen univariate polynomial matrix Z(z) = C + z{A — C) where C £ Z nxn , with 
a dedicated choice of projections u = <p £ Z lx " and v = tjj £ Z" x l . Th e algorithm 



uses Strassen's avoidance of divisions (see (ptrassenl . 11973c iKaltofenl . 119921 )). Since the 



determinant of Z is a polynomial of degree n in z, the arithmetic operations over K in 
Det may be replaced by operations on power series in R[[z]} modulo z n+1 . Once the 
determinant of Z(z) is computed, the evaluation (detZ)(l) = det(C + 1 x (A — C)) 
gives the determinant of A. The choice of C, <p and ip is such that, whenever a division 
by a truncated power series is performed the constant coefficients are ±1. Therefore the 
algorithm necessitates no divisions. Note that, by construction of Z(z), the constant 
terms of the power series involved when Det is called with inputs Z(z),tp and ip, are the 
intermediary values computed by Det with inputs C, if and tp- 

The cost for computing the determinant of A without divisions is then deduced as 
follows. In step I and step ii of Algorithm Det applied to Z(z), the vector and ma- 
trix entries are polynomials of degree O(yfn). The cost of step ii dominates, and is 



11 



0(n 3 M(-/nT) logra) = 0\n Z y/n) operations in R. step III, iv, and v cost 0{n 2 y/n) op- 
erations on power series modulo z n+l , that is 0(n 2 M(n)y / n) operations in R. Hence 
det Z(z) is computed in 0\n y/n) operations in R, and det A is obtained with the same 
cost bound. 

An main property of Kaltofen's approach (which also holds for the improved blocked 



version of Kaltofen and Villard (20041)'), is that the scalar value det A is obtained via 



the computation of the polynomial value det Z(z). This property seems to be lost with 
the adjoint computation. We are going to see how Algorithm Adjoint applied to Z(z) 
allows to compute A* £ R nxn in time 0~(n 3 v / n) operations in R, but does not seem to 
allow the computation of Z*(z) € R[z]" x " with the same complexity estimate. Indeed, 
a key point in Kaltofen's approach for reducing the overall complexity estimate, is to 
compute with small degree polynomials (degree O(yfn)) in step i and step II. However, 
since the adjoint algorithm has a reversed flow, this point does not seem to be relevant 
for Adjoint, where polynomials of degree n are involved from the beginning. 

Our approach for computing A* over R keeps the idea of running Algorithm Adjoint 
with input Z(z) = C + z(A — C), such that Z*(z) has degree less than n, and gives 
A* = Z*(X). In Section 6.1, we verify that the implementation using Proposition 4, needs 
no divisions. We then show in Section 6.2 how to establish the cost estimate 0~(n 3 y/n). 
The principle we follow is to start evaluating polynomials at z — 1 as soon as computing 
with the entire polynomials is prohibitive. 

6.1. Division-free Hankel matrix inversion and anti-diagonal sums 

In Algorithm Adjoint, divisions may only occur during the anti-diagonal sums com- 
putation. We verify here that with the matrix Z(z), and the special projections (p G 
Z lx ™, tp £ Z nxl , the approach described in the Appendix for computing the anti-diagonal 
sums requires no divisions. Equivalcntly, since we use Strassen's avoidance of divisions, 
we verify that with the matrix C and the projections tp, ip, the approach necessitates no 



divisions. As we are going to see, this a direct consequence of the construction of lKaltofen 



(Il992h . 

Here we let hk = <pC k ip for < k < 2n — 1, a(x) — x 2n , and b{x) = hox 2 "^ 1 + 
h\x 2n ~ 2 + . . . + h-2n-i- The extended Euclidean scheme with inputs a and b leads to a 



normal sequence, and after n—l and n steps of the scheme, we get (see (|Kaltofenl . 11992 , 
Sec. 2)): 

s(x)a(x) + t(x)b(x) = c(x), with deg s = n — 2, degi = n — 1, dege = n, (29) 

and 

s(x)a(x) + i{x)b{x) = c(x), with deg s = n — 1, degi = n, degc = n—l. (30) 
The polynomial t is such that 

i — ±x n + intermediate monomials + 1 = ±/, (31) 

with / the minimum polynomial of {/ife}o<fc<2n-i- One may check, in particular, that 
the n equations obtained by identifying the coefficients of degree 2n — 1 > k > n in (30) 
give the linear system (11), that defines /. The polynomial c also has leading coefficient 
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±1. By identifying 


the coefficients of dej 


free 2n — 1 > k > n 


in (29), we obtain: 




to 




ho hi 


■ ■ ■ 




to 











ti 




hi h 2 


... h n 




tl 









H 














= ± 




(32) 




t n -i 




h n -i ■ ■ ■ 


■ ■ ■ h2n~2 




t n -l 




1 





Therefore t = ±g with g the polynomial needed for computing (44)- (47), in addition 
to /. Since C, tp, and i\> are s uch that the extended Euclidean scheme necessitates no 
divisions (see ( Kaltofenl . ll992 , Sec. 2)), we see that both / and g may be computed with 
no divisions. The only remaining division in the algorithm for Proposition 4 is at (38). 
From (31), this division is by fo = 1. 

6.2. Lazy polynomial evaluation and division- free adjoint computation 

We run Algorithm Adjoint with input Z(z) £ R[z] nxn , and start with operations on 
truncated power series modulo z n+1 . We assume that Algorithm Det has been executed, 
and that its intermediary results have been stored. 

Using Proposition 4 and previous section, step i* requires 0(G(n)M(n)) = (X(n 2 ) 
operations in R for computing dH(z) of degree n in R[z] rxs . step ii*, step hi*, and V* 
cost 0(n 2 y/n) operations in K, hence, taking into account the power series operations, 
this gives 0(n 2 M(n)^/n) — 0~(n 3 y/n) operations in R for the division-free version. The 
cost analysis of step iv*, using (26) over power series modulo z n , leads to log 2 r matrix 
products, hence to the time bound <X(n 4 ), greater than the target estimate 0\n 3 y/n). 
As noticed previously, step hi of Algorithm Det only involves polynomials of degree 
0(y/n), while the reversed program for step iv* of Algorithm Adjoint, relies on dB(z) 
whose degree is n. 

Since only Z*(l) = A* is needed, our solution, for restricting the cost to 0"(n 3 y / n), is to 
start evaluating at Z = 1 during step iv*. However, since power series multiplications are 
done modulo z n , this evaluation must be lazy. The fact that matrices Z k (z), 1 < k < r— 1, 
of degree at most r— 1 are involved, enables the following. Let a and c be two polynomials 
such that dega + degc = r— 1 in R[z], and let b be of degree n > r — 2 in R[z]. Considering 
the highest degree part of 6, and evaluating the lowest degree part at z = 1, we define 



b H (z) 
that 



v r-2 



M-r+2 



£ R[z] and 6l = b 



'n-r+l 



bn S R. We then remark 



(a(z)b{z)c(z) mod z n+1 ) (1) = (a{z)(b H (z)z n - r+2 + b L )c(z) mod z n+1 ) (1), 

= (a{z)b H {z)c{z) mod z^ 1 ) (1) + (a(z)b L c(z)) (1). 



(33) 



For modifying step iv*, we follow the definition of bn and 6^, and first compute 
dB H (z) e R[z] nx "ofdegreer-2, &nddB L <E R" x ™. Applying (33), the sum £Li Z r ~ k {z) 
dB{z) ■ Z k ~ 1 (z) may then be evaluated at z = 1 by the program 

Modified step iv*. Z* := Z r - k (z) ■ dB H {z) ■ Z k - X (z) mod z r ~ l ) (1) 

Z* := Z* + (ELi Z r - k (z) ■ dB L ■ Z k -\z)) (1), 
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in 0~(n 3 M(r)) = 0~(n 3 y/n) operations in R. This leads to an intermediary value Z* 6 
R nx " before step v*. The value is updated at step v* with power series operations, 
and a final evaluation at z = 1 in time 0~(n 2 rM(n)) — 0~(n 3 y/n). Since only step iv* 
has been modified, we obtain the following result. 

Theorem 3. Let A e R™ x ™. If Algorithm Adjoint, modified according to (34), is ex- 
ecuted with input Z(z) = C + z(A — C), power series operations modulo z n+1 , and a 
final evaluation at z = 1, then the matrix adjoint A* is computed in 0~{n 3 y/n) operations 
in R. 



7. Concluding remarks 



We have developed an explicit algorithm for computing the matrix adjoint using only 
ring arithmetic operations. The algorithm has complexity estimate (X(n 35 ). It repre- 
sents a practical alternative to previously existing solutions for the problem, that rely on 
automatic differentiation of a determinant algorithm. Our description of the algorithm 
allows direct implementations. It should help understanding how the adjoint is computed 
using Kaltofen's baby steps/giant steps construction. Still, a full mathematical explana- 
ti on deserves to be inve s tigate d. Our work has to be generalized to the block algorithm 
of Kaltofen and Villardl ( 2004 ) (with the use of fast matrix multiplication algorithms) 
whose complexity estimate is currently the best known for computing the determinant, 
and the adjoint without divisions. 



Acknowledgements. We thank Erich Kaltofen who has brought reference lOstrowski et al 
(|197lh to our attention. 



Appendix: Hankel matrix inversion and anti-diagonal sums 

For implementing (17), we study the computation of the anti-diagonal sums gj~ of H^ 1 
and H^ 1 



We first use the formula of lLabahn et al. (1990) for Hankel matrices inversion. The 



minimum polynomial / of {h k }o<k<2n-i is f{x) = fa + fix + . 
satisfies (11). Let the last column of be given by 

H [go, gi, . . . , g n -i] T — [0, . . . , 0, 1} T G K". 

Applying ( Labahn et al. . 1990L Theorem 3.1) with (11) and (35), we know that 



and 
(35) 



li- 



ft 

fn-l 
1 



fit 



(Jo 



9o 



fji 

9n-l 




g n -i o 



fo ■ ■ ■ fn- 







fo 



(36) 



For deriving an analogous formula for H A , using the notations of Section 4.1, we first 
recall that H = JC U JC V and Ha — JC U AJC V . Multiplying (11) on the left by K-uAK.^ 1 gives 

Ha [fo, fi, ■ ■ ■ , fn-i] T = — [h n +i, h n +2, ■ ■ ■ , h2n] T ■ (37) 
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We also notice that 

HaH- 1 = (K.- 1 A T 1C U ) T , 
and, using the action of A T on the vectors u T , . . . , (A T ) n ~ 2 u T , we check that HaH' 1 is 
the companion matrix 



H A H- 



1 

... 

-fo —fi • ■ ■ 





1 

■/n-1 



Hence the last column [g^, g*, . . . , g^-i] of H A X is the first column of H 1 divided by 
—fo- Using (36) for determining the first column of H' 1 , we get 

So, 



l9o,9i, ■ ■ -,9 n -l\ 



fi 



■[/l,---,/n-l>l] + \gi,---,9n-l,0] 



(38) 



Applying (jLabahn et all 1990, Thcorem3.1), now with (37) and (38), we obtain 



fl ■ ■ ■ fn-l 1 

fn-1 ■■' 
1 



9o ■■■ 9n-l 



91 

9n-l 





9n-l 



fo ■ ■ ■ fn 



fo 



(39) 



From (36) and (39) we see that computing UkiH' 1 ) and ak-i{H^ 1 ) 1 for < k < 2n— 1, 
reduces to computing the anti-diagonal sums for a product of triangular Hankel times 
triangular Toeplitz matrices. Let 



M = LR = 



We have 



and 



Iq li 

h 

i+j-2 



ln-1 



ro ri 



r n -2 



ro 



= ^2 Z s r i+i _ s _ 2 , l<i + j-l<n, 



s—i—l 
n-1 



lsn + j- s -2, n<i+j — l<2n—l. 



(40) 



(41) 



s=i— 1 



For < k < 2n — 2, <jfc(M) is defined by summing the rriij's such that i + j — 2 = k. 
Using (40) we obtain 



0*(M) = J2i=l ™>i,k-i+2 = Ei=l Z)a=i-1 k 



= ELo( s + l)Uk-., < fc < n - 1, 



15 



hence 

n-l 



(]T l s x s+1 )'(J2 r s x s ) mod x n = ^ a k (M) x k . (42) 

s=0 s=0 fc=0 

In the same way, using (41) with k = k — n + 2, we have 

a M M J — 2^1=1 m i+fc-l,n-i+l — Z^i=l 2^ s=4 's+fc-2 r n-s> 

= S"=S-i( s + n ~ k) l s r k - s , n~l<k<2n-2, 

and 

n n— 1 n— 1 

l n-s-ix s ) mod = £ <7 2n _ fe _ 2 (M) (43) 

s=l s=0 fc=0 

It remains to apply (42) and (43) to the structured matrix products in (36) and (39), 
for computing the er^if -1 ) and a k {H^ )'s. Together with the minimum polynomial / = 
/o + . ■ . + f r ^ix n - 1 +x n , let g = 5o + . . .+g n ^x n ' 1 (see (35)), and g* = g* . . .+g* n ^ 1 x n - 1 
(see (38)). We may now combine, respectively (36) and (39), with (42), for obtaining 

n-l 

/'s-s'/modi^^a^ff- 1 )^, (44) 

fc=0 

and 

71-1 

f'g* - (g*)'f mod x n = ]T a k (H^) x k . (45) 

fc=0 

Defining also rev(/) = 1 + f n _ 1 x + . . . + f x n , rev(g) = g n -ix + . . .+g Q x n , and rev(g*) = 
9n-\X + ■ ■ ■ + goX n , the combination of, respectively, (36) and (39), with (43), leads to 

n-l 

rev(s)'rev(/) - rev(/)'rev(. 9 ) mod x" = ^ a 2n -k-2( H ) ^ > ( 46 ) 

and 

n-l 

rev( 3 *)'rev(/) - rev(/)'rev( 5 *) mod x n = £ <r 2n ^ 2 {H A ) x k . (47) 

fc=0 

Proposition 4. Assume that the minimum polynomial f and the Hankel matrices H 
and Ha are given. The anti-diagonal sums <7fc(-ff _1 ) and akiH^ 1 ), for < k < 2n — 1, 
can be computed in G(ri) + 0(M(n)) operations in K. 

Using the approach of iBrent et~aH (|l980h we know that computing the last column 
of H^ 1 reduces to an extended Euclidean problem of degree 2n. Hence the polynomial 
g is computed in G(n) + 0(n) operations. From there, g* is computed using (38). Then, 
applying (44)- (47) leads to the cost 0(M(n)). 
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